Bayesian co-clustering
نویسندگان
چکیده
Co-clustering means simultaneously identifying natural clusters in different kinds of objects. Examples include simultaneously clustering customers and products for a recommender application; simultaneously clustering proteins and molecules in microbiology; or simultaneously clustering documents and words in a text mining application. Important insights into a problem can be gained by understanding the interactions between clusters for the different kinds of objects. This paper considers Bayesian models for co-clustering. The Bayesian approach begins by developing a model for the data generating process, and inverting that model through Bayesian inference to infer cluster membership, learn characteristics of the clusters, and fill in missing observations. We consider a basic Bayesian clustering model and several extensions to the model. Experimental evaluations and comparisons among the clustering methods are presented. © 2015 Wiley Periodicals, Inc.
منابع مشابه
Latent Dirichlet Bayesian Co-Clustering
Co-clustering has emerged as an important technique for mining contingency data matrices. However, almost all existing coclustering algorithms are hard partitioning, assigning each row and column of the data matrix to one cluster. Recently a Bayesian co-clustering approach has been proposed which allows a probability distribution membership in row and column clusters. The approach uses variatio...
متن کاملNonparametric Bayesian Co-clustering Ensembles
A nonparametric Bayesian approach to co-clustering ensembles is presented. Similar to clustering ensembles, coclustering ensembles combine various base co-clustering results to obtain a more robust consensus co-clustering. To avoid pre-specifying the number of co-clusters, we specify independent Dirichlet process priors for the row and column clusters. Thus, the numbers of rowand column-cluster...
متن کاملNonparametric Bayesian Models for Unsupervised Learning
NONPARAMETRIC BAYESIAN MODELS FOR UNSUPERVISED LEARNING Pu Wang, PhD George Mason University, 2011 Dissertation Director: Carlotta Domeniconi Unsupervised learning is an important topic in machine learning. In particular, clustering is an unsupervised learning problem that arises in a variety of applications for data analysis and mining. Unfortunately, clustering is an ill-posed problem and, as...
متن کاملPAC-Bayesian Generalization Bound for Density Estimation with Application to Co-clustering
We derive a PAC-Bayesian generalization bound for density estimation. Similar to the PAC-Bayesian generalization bound for classification, the result has the appealingly simple form of a tradeoff between empirical performance and the KL-divergence of the posterior from the prior. Moreover, the PACBayesian generalization bound for classification can be derived as a special case of the bound for ...
متن کاملPAC-Bayesian Analysis of Co-clustering and Beyond
We derive PAC-Bayesian generalization bounds for supervised and unsupervised learning models based on clustering, such as co-clustering, matrix tri-factorization, graphical models, graph clustering, and pairwise clustering.1 We begin with the analysis of co-clustering, which is a widely used approach to the analysis of data matrices. We distinguish among two tasks in matrix data analysis: discr...
متن کامل